Edward Swing, Vision Systems & Technology, Inc.
Student Team: NO
The Prajna Project is a Java toolkit designed to provide various capabilities for visualization, knowledge representation, geographic displays, semantic reasoning, and data fusion. Rather than attempt to recreate the significant capabilities provided in other tools, Prajna instead provides software bridges to incorporate other toolkits where appropriate. Prajna will be released to the Open Source community in the near future.
For this challenge, I developed a custom application using the Prajna Project. The Prajna Project provided the utilities for reading the data file. I used graph display capabilities within the Prajna project, and designed several graph arrangements to attempt to represent particular features of the social network. However, I also used the Prajna software bridge to Prefuse to incorporate a Force-Directed arrangement to display the social network in a more aesthetic fashion. Prefuse is an open-source visualization toolkit, primarily developed by Jeffrey Heer.
In addition, the software bridge to JFreeCharts provided the charting capability for statistical displays. JFreeChart is an open source project that provides a Java toolkit for displaying graphical charts. Prajna includes a GUI component that allows dynamic selection of different chart displays, based on the data fields. Finally, to provide a geographic display, I used the Geographic canvas available in the Prajna project, projecting the network onto estimated geographic coordinates for the cell-phone towers.
The Prajna Project is a toolkit developed by Edward Swing. The custom application was built at VSTI. Other VSTI programs have since incorporated some of the new components that were developed for this contest.
To analyze the data, I displayed the social network using both the Prefuse force-directed display and the Prajna graph arrangement algorithms. By filtering on the dates, selecting particular nodes of the graph, and limiting the graph radius, I deduced several significant clues. We corroborated our theories by displaying call patterns in a bar chart, and using a geographic map to represent any significant changes of location.
The provided information, identifying Ferdinando Catalano as ID 200, was taken to be factual at the beginning of the period. Using this, I then displayed a social network graph using ID 200 as the central node, with a radius of 2. There were obviously several nodes of activity around his immediate associates, in particular nodes 1, 2, 3, 5, and 306. Reducing the radius to 1, and viewing the number of calls suggested that ID 5 was Ferdinando's brother Estaban, who was reported to be the person Ferdinando would be in contact with most often. The number of immediate connections to ID 1 suggested that this might be David Vidro. 2 and 3 would likely be Juan and Jorge Vidro.
Force-Directed Display of Catalano/Vidro social structure on June 1. Graph displayed with radius 2 around node 200. Node 1 highlighted, with immediate neighbors shown in gold.
The next step we attempted was to filter the phone calls by frequency, identifying which persons were in contact every day, or on most days. Unfortunately, this effort did not yield much pertinent information. There were a small number of daily contacts, but the primary nodes were not among them. Therefore, the next step was to review the network on a daily basis.
In the force-directed layout for the entire 10-day period, nodes 5 and 306 were arranged very close together, but did not actually connect with each other. However, the neighbors of 5 and 306 matched almost exactly. This immediately caught our attention, and we decided to investigate further. By restricting the graph to display only those calls on June 1st, ID 306 disappeared from the social network. However, when only the records from Jun 10th were displayed, the activity levels of node 200 (and the immediate neighbors 1, 2, 3, and 5) almost disappear from the graph. Furthermore, several other nodes with many connections appeared, such as 309 and 360.
To help understand the temporal patterns further, I added a chart of calls over time for the current network or for a single individual. By focusing on 5, we see significant activity up though June 7, then the calls drop to near nothing for the rest of the period. IDs 200, 1, 2, and 3 follow similar patterns. But nodes 306, 309, and 360 show just the opposite, with little or no communication until June 8. This suggested either an upheaval in the leadership of the organization, or the leaders of the organization switched cell phones. Since the contacts of 5 and 306 matched almost exactly, it seemed more likely that 5 simply changed his cell phone to 306.
Calling patterns of Node 5 (left) and Node 306 (right). Red indicates 5/306 was the sender, blue were other individuals calling him.
At this point, I decided to investigate the other nodes that seemingly changed. Examining the neighbors of 1 in the graph display, I noticed several neighbors which had multiple connections. I selected 134, and viewed the graph for the entire period of time. Just as with 5 and 306, I noticed that 1 shared most of the neighbors with 309, but were never in direct contact. So again, this appeared to be a match. I repeated the process with 2, using 54 as the well-connected neighbor. This showed that 2 and 397 similarly overlapped. Again, I repeated the process with 3 to notice that 3 matched with 360.
Matching Node 2 to 397 in the force-directed display, using Node 54 as the common neighbor.
I had identified the IDs of Ferdinando Catalano's contacts at the end of the period. However, I needed to determine what happened to Ferdinano Catalano. By viewing the nodes surrounding 306 for the entire 10-day period, we see that 306 and 309 both contact 300. Examining the neighbors of 300, we see all of the other central nodes. Therefore, we determine that Ferdinando Catalano is now 300. I noted that Ferdinando curiously was rather inactive on Jun 10th, though he was involved in the network on June 9th.
Once I identified the ID changes for the known personalities, I delved into possible causes. I checked the call frequency graphs for each of the primary personalities, and discovered that all of the ID switches occurred on the same day. This suggested an event that affected all of the Paraiso leadership. I started to look for causes for this change of behavior.
I started by examining the social network for changes to the secondary personalities. First, I examined the calling patterns of individuals who had multiple contacts that also communicated with one of the central personalities. I noticed that most of these individuals maintained a similar calling pattern, switching from the old ID to the new ID. A few secondary nodes did not re-establish contact between June 7th and June 10th, but this did not seem significant.
Geographic movement of David Vidro and his immediate contacts. Map on left shows locations of ID-1 and immediate contacts on June 1. Map on right shows locations of ID-309 and immediate contacts on June 10.
The cell tower IDs provided another possible source of information. By estimating the coordinates of the towers on the map, I added a geographic display to show the network on any particular day. Using this technique, I discerned some significant geographic changes to the network. David Vidro (ID 1/309) and many of his direct contacts moved significantly after the ID switch. I investigated other central figures, and saw similar movement for each of the personalities. This suggested that the network was reacting to an outside force.
Social Network on June 7th, using experimental graph arrangement. Graph displayed with radius 2 around node 200. Note phone call activity from subordinates (highlighted in red) at a particular moment of time.
To analyze this hypothesis further, I examined the calling activity prior to the ID switch. I noticed a significant increase of activity - almost double the average number of calls - on June 7 across the entire network. Examining the immediate network around our central personalities, I noticed the same behavior. Using the Prajna graph display, and adding a temporal slider, I viewed the phone call activity during June 7th. The central personalities received quite a few phone calls from their associates. We might assume these subordinates were trying to receive instructions in reaction to, or preparation for, an event.